Background

Column

Background Information

PM stands for particulate matter (also called particle pollution): the term for a mixture of solid particles and liquid droplets found in the air. Some particles, such as dust, dirt, soot, or smoke, are large or dark enough to be seen with the naked eye. Others are so small they can only be detected using an electron microscope.

These particles come in many sizes and shapes and can be made up of hundreds of different chemicals. Some are emitted directly from a source, such as construction sites, unpaved roads, fields, smokestacks or fires. Most particles form in the atmosphere as a result of complex reactions of chemicals such as sulfur dioxide and nitrogen oxides, which are pollutants emitted from power plants, industries and automobiles.

PM2.5: fine inhalable particles, with diameters that are generally 2.5 micrometers and smaller.

Column

Data

The data analyzed on this website are the annual mean PM2.5 averaged over the period 2008 through 2010.

PM2.5 Distribution By County

Column

Distribution of PM2.5 Values By County

The boxplot appears to be in a normal distribution excluding the outliers. The median is at 10, lower fence is at 5, and upper fence is at 15. There are a few outliers above and below.

Column

Highest Counties

The following Counties are exceeding the previous requirement of no more than 15 PM2.5

# A tibble: 8 × 5
   pm25  fips region longitude latitude
  <dbl> <dbl> <chr>      <dbl>    <dbl>
1  16.2  6019 west       -120.     36.6
2  15.8  6029 west       -119.     35.3
3  18.4  6031 west       -120.     36.2
4  16.7  6037 west       -118.     34.1
5  15.0  6047 west       -121.     37.2
6  17.4  6065 west       -117.     33.8
7  16.3  6099 west       -121.     37.6
8  16.2  6107 west       -119.     36.2

Fresno County, Kern County, and more. All appear to be in the west, more specifically California.

Region PM2.5 Distribution

Column

Box Plot Distribution by Region

Violin Plot Distribution By Region

Column

Analysis

  • East has a higher median level of pollution and more of the data has higher pollution levels

  • West has lower median and more of the data is lower, but it has very high outliers

  • The violin plot allows you to better visualize how the data is distributed

Histogram Analysis

The graph displays the cutoff for the current requirement of a maximum PM2.5 count of 12 or lower.

Column

Histogram Distribution by County

Column

Histogram Distribution by Region

Analysis For Region Distribution

-The east seems to have much more data which can be seen from the larger count values

-The east appears to be slightly skewed left and the west appears to be skewed right.

-The west has a greater range

Scatterplot Analysis

Column

Overlapping Dist Of PM2.5 vs latitude by Region

Seperated Dist Of PM2.5 vs latitude by Region

Column

Analysis

The east appears to have some sort of nonlinear correlation between PM2.5 values and latitude, but the west appears to have no correlation between latitude and pollution.

Both regions seem to have most of their data in the middle latitude values and few at the extremes

Correlogram

Column

Correlogram

Column

Analysis

There doesn’t really appear to be much correlation between any of the graphs, but the correlogram shows that for latitude and pm25 it appears to be negative linear

For longitude and pm25 it is positive linear, and for latitude and longitude it is negative linear

---
title: "Midterm Part 2"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "red"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
library(vioplot)
library(corrgram)
avgpm25<-read_csv("C:/Users/ngles/Downloads/avgpm25.csv")
```

Background
===

Column {data-width=450}
---

### Background Information

PM stands for particulate matter (also called particle pollution): the term for a mixture of solid particles and liquid droplets found in the air. Some particles, such as dust, dirt, soot, or smoke, are large or dark enough to be seen with the naked eye. Others are so small they can only be detected using an electron microscope.

These particles come in many sizes and shapes and can be made up of hundreds of different chemicals. Some are emitted directly from a source, such as construction sites, unpaved roads, fields, smokestacks or fires. Most particles form in the atmosphere as a result of complex reactions of chemicals such as sulfur dioxide and nitrogen oxides, which are pollutants emitted from power plants, industries and automobiles.

**PM2.5:** fine inhalable particles, with diameters that are generally 2.5 micrometers and smaller.

Column {data-width=550}
---

### Data

The data analyzed on this website are the annual mean PM2.5 averaged over the period 2008 through 2010.

```{r show_table}
datatable (avgpm25[1:500,], rownames=FALSE, colnames=c("PM2.5 Count", "Five Digit County Code (fips)", "Region", "Longitude", "Latitude"), options=list(pagelength=20))
```


PM2.5 Distribution By County
===

Column {data-width=500}
---

### Distribution of PM2.5 Values By County
```{r bxplt}
boxplot(avgpm25$pm25, ylab="Fine Particle Pollution (micrograms/cubic meter)", main="Boxplot of Fine particle pollution per County", col="#54d2d2")
```

The boxplot appears to be in a normal distribution excluding the outliers. The median is at 10, lower fence is at 5, and upper fence is at 15. There are a few outliers above and below.

Column {data-width=500}
---

### Highest Counties

The following Counties are exceeding the previous requirement of no more than 15 PM2.5 

```{r highcounties}
filter(avgpm25, avgpm25$pm25>15)
```

Fresno County, Kern County, and more. All appear to be in the west, more specifically California.

Region PM2.5 Distribution
===

Column {.tabset data-width=500}
---

### Box Plot Distribution by Region

```{r bxpltbyregion}
boxplot(avgpm25$pm25~avgpm25$region, xlab="Region", ylab="Fine Particle Pollution (micrograms/cubic meter)", main="Boxplot of Fine particle pollution per County by Region", col="#54d2d2")
```

### Violin Plot Distribution By Region

```{r viopltbyregion}

vioplot(avgpm25$pm25~avgpm25$region, xlab="Region", ylab="Fine Particle Pollution (micrograms/cubic meter)", main="Violin plot of Fine particle pollution per County by Region", col="#54d2d2")
```

Column {data-width=500}
---

### Analysis

- East has a higher median level of pollution and more of the data has higher pollution levels

- West has lower median and more of the data is lower, but it has very high outliers

- The violin plot allows you to better visualize how the data is distributed

Histogram Analysis
===

The graph displays the cutoff for the current requirement of a maximum PM2.5 count of 12 or lower.

Column {data-width=500}
---

### Histogram Distribution by County

```{r hist}
ggplot(avgpm25, aes(x = pm25)) + geom_histogram( fill="blue") + geom_vline(xintercept = 12, color = "red") + geom_text(aes(x = 14, y = 65, label = "Maximum Allowed Pollution"))+ labs(title = "Distribution of PM2.5 by Region", x = "PM2.5 Levels (micrograms/cubic meter)", y = "Count")
```





Column {.tabset data-width=500}
---

### Histogram Distribution by Region

```{r histbyreg}
ggplot(avgpm25, aes(x = pm25, fill = region)) + geom_histogram() + facet_grid(rows = vars(region)) + labs(title = "Distribution of PM2.5 by Region", x = "PM2.5 Levels (micrograms/cubic meter)", y = "Count")
```


### Analysis For Region Distribution

-The east seems to have much more data which can be seen from the larger count values

-The east appears to be slightly skewed left and the west appears to be skewed right.

-The west has a greater range

Scatterplot Analysis
===

Column {.tabset data-width=500}
---

### Overlapping Dist Of PM2.5 vs latitude by Region

```{r scat}
ggplot(avgpm25, aes(x=latitude, y=pm25, color=region))+geom_point()+labs(title = "Distribution of PM2.5 vs latitude by Region", y = "PM2.5 Levels (micrograms/cubic meter)", x = "Latitude")
```

### Seperated Dist Of PM2.5 vs latitude by Region
```{r scat2}
ggplot(avgpm25, aes(x=latitude, y=pm25))+geom_point(color="blue")+facet_grid(cols = vars(region))+labs(title = "Distribution of PM2.5 vs latitude by Region", y = "PM2.5 Levels (micrograms/cubic meter)", x = "Latitude")
```

Column {data-width=500}
---

### Analysis

The east appears to have some sort of nonlinear correlation between PM2.5 values and latitude, but the west appears to have no correlation between latitude and pollution.

Both regions seem to have most of their data in the middle latitude values and few at the extremes


Correlogram
===

Column {data-width=500}
---

### Correlogram

```{r corr}
corrgram(avgpm25[, c("latitude", "longitude", "pm25")], lower.panel = panel.pts)
```

Column {data-width=500}
---

### Analysis

There doesn't really appear to be much correlation between any of the graphs, but the correlogram shows that for latitude and pm25 it appears to be negative linear

For longitude and pm25 it is positive linear, and for latitude and longitude it is negative linear